# TUTORIAL 6 : Exercise 3¶

Author: | Andrey V. Brukhno (andrey.brukhno{at}stfc.ac.uk) |
---|

## Umbrella sampling in windows with the use of WHAM method¶

With rough energetic landscapes and large variations in free energy it may be computationally very costly to achieve reasonable precision over the entire range of interest in a single simulation run, e.g. with EE or WL methods. In such intricate cases the classic *umbrella sampling in sub-ranges, or windows,* can be more robust.

That is, the order parameter range is subdivided into relatively narrow “windows” where a *window-specific* harmonic biasing potential, \(U(\lambda)=\frac{k_f}{2}(\lambda-\lambda_0)^2\), is applied (which creates an umbrella-like probability distribution over each particular window). Clearly, a biased simulation restrained in such a way focuses on sampling in the vicinity of its \(\lambda_0\) value, whereby improving on the accuracy of obtained FED data.

**NOTE:** the restraint associated with an umbrella (window) has no hard boundaries to it; rather the resulting (biased) probability distribution decays smoothly away from the window central point. (The only possible exclusion from this rule might be the two outmost windows.)

There is, however, one complication: one needs to correctly relate FED fragments corresponding to different windows with respect to each other. This is where the “magic” of **weighted histogram analysis method (WHAM)** comes in handy.

Let us see how it works in the case of two charged nanoparticles from the previous exercises (TUTORIAL 6 : Exercise 2).

## Exercise 3.1¶

Change to directory **tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2/**, where the input files for this exercise are found.

The FIELD file is still the same as in TUTORIAL 6 : Exercise 2, but the CONTROL file has been amended:

```
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ diff CONTROL ../../tutorial6-2/FED_HSR10_Q20_EDL2/CONTROL
< use repexch 4 0.0 1000
---
> use repexch 4 50.0 500
11,12c11
< fed method US 22.0 2.0 500000
< #fed method WL 0.004 0.50 500000 3 #3 398 0.5 #0.7071068 100000
---
> fed method WL 0.004 0.50 500000 3 #3 398 0.5 #0.7071068 100000
16c15
< fed order param com2 200 20. 30. 1
---
> fed order param com2 200 20. 40. 1
33c32
< steps 2000000 #16000000
---
> steps 8000000 #16000000
```

**NOTE:** For clarity we do not use replica-exchange with temperature variation (so \(\Delta T=0.0\)), yet we will be running a parallel job with 4 threads to speed things up.

You can notice that the ‘fed method WL’ directive has been replaced by the ‘fed method US’ record with three parameters given: \(R_0=22.0\) (Angstrom), \(2k_f=2.0 (kT/A^2)\) and the frequency of printing out FED data into FEDDAT files (0.5 million MC steps).

The input provided is for the first US window centered at \(R_0\). You will need to amend the CONTROL file for each subsequent run, shifting the window in steps of 2 Angstrom.

Run the simulation through the queue (*sbatch parallel.sub*), and check the final **total** data accumulated over all 4 replicas - FEDDAT.TOT_004 (only created when \(\Delta T=0.0\)):

```
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ gnuplot
gnuplot> plot 'FEDDAT.000_004' u 1:2 w l lc 'black' t "US window 1, 2 mln steps"
```

**NOTE:** only a small portion of the range (approximately 20 - 24 A) has been covered in the simulation, whereas outside the sampled window the data represent the pure biasing potential taken with minus sign (a few spikes reveal a quick drift from the intial separation *R=30* A).

Let us zoom in:

```
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ gnuplot
gnuplot> plot [x=20:25] [y=0:8] 'FEDDAT.000_004' u 1:2 w l lc 'black' t "US window 1, 2 mln steps"
```

Finally, let us check the umbrella probability histogram:

```
gnuplot> plot 'FEDDAT.000_004' u 1:3 w l lc 'black' t "US umbrella 1, 2 mln steps"
```

Store away the data in a newly-created subdirectory:

```
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ mkdir equil-us1/
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ cp CON* FIELD equil-us1/
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ mv H* FED* TR* R* S* P* O* equil-us1/
```

Now increase \(R_0\) by 2 A in the CONTROL file and proceed repeating this protocol another 3 times…

After all four simulations are finished and stored in ‘equil-us#’ subdirectories (hash standing for the window index), you should have 4 directories: ‘equil-us1’, ‘equil-us2’, ‘equil-us3’, ‘equil-us4’, each containing the simulation results a particular US window. You can check the FED data altogether as follows:

```
gnuplot> plot 'equil-us1/FEDDAT.TOT_004' u 1:2 w l lc 'black' t "US window 1, 2 mln steps", \
'equil-us2/FEDDAT.TOT_004' u 1:2 w l lc 'red' t "US window 2, 2 mln steps", \
'equil-us3/FEDDAT.TOT_004' u 1:2 w l lc 'green' t "US window 3, 2 mln steps", \
'equil-us4/FEDDAT.TOT_004' u 1:2 w l lc 'blue' t "US window 4, 2 mln steps"
```

and similarly for the histograms:

```
gnuplot> plot 'equil-us1/FEDDAT.TOT_004' u 1:3 w l lc 'black' t "US window 1, 2 mln steps", \
'equil-us2/FEDDAT.TOT_004' u 1:3 w l lc 'red' t "US window 2, 2 mln steps", \
'equil-us3/FEDDAT.TOT_004' u 1:3 w l lc 'green' t "US window 3, 2 mln steps", \
'equil-us4/FEDDAT.TOT_004' u 1:3 w l lc 'blue' t "US window 4, 2 mln steps"
```

We see that it would be rather non-trivial to “stitch” the four FED lines together unambigously, so as to reconstruct the overall FED profile in the entire simulated range. This is despite virtually noiseless FED data in the mid region of each window and quite reasonable overlaps between neighbouring umbrellas. Lastly, some data - where the noise is unacceptably high - would have to be discarded in the process…

**Remarkably, WHAM not only uses all of the collected stats but also does a great job of optimally and seamlessly stitching FED fragments to produce a single smooth FED profile!** - Fortunately, DL_MONTE package provides a Python script, called *dlm_wham.py3*, to do this nice job for you.

Before we could use the WHAM script on our data, we have to prepare an extra input file (‘WINDOWS_HUS’) - storing the US parameters from all the windows:

```
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ dlm-prep-wham.bsh "equil-us?" > WINDOWS_HUS
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ less WINDOWS_HUS
#Umbrella sampling windows for WHAM (dlm-wham.py)
22.0 2.0
24.0 2.0
26.0 2.0
28.0 2.0
```

Now we are ready to do the WHAM magic:

```
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ dlm_wham.py3 -d "equil-us?" -i "FEDDAT.TOT_004"
==========================
Pattern for input files : 'equil-us?/FEDDAT.TOT_004' (see below)
Prefix for output files : './WHAMDAT' (with suffices appended)
End bins skipped per set : 0
==========================
List of input files :
--------------------------
0 : WINDOWS_HUS (parameters)
1 : equil-us1/FEDDAT.TOT_004
2 : equil-us2/FEDDAT.TOT_004
3 : equil-us3/FEDDAT.TOT_004
4 : equil-us4/FEDDAT.TOT_004
==========================
Collecting the input data...
--------------------------
starting arrays in range 1 : 20.075 29.975 0 199 199 199
found zero by the end of set 180 29.075 199 - skipping the rest!
extending arrays in range 2 : 21.775 29.975 0 165 165 199
found zero by the end of set 172 29.225 149 199 - skipping the rest!
extending arrays in range 3 : 23.475 29.975 0 131 131 199
found zero by the end of set 172 29.375 118 199 - skipping the rest!
extending arrays in range 4 : 25.575 29.975 0 89 89 199
==========================
Running WHAM iteration...
--------------------------
WHAM done after 272 iterations: SUM_win{ [ln(Z_new) - ln(Z_old)]^2 } = 9.87797065360853e-11
--------------------------
Storing WHAM FED data : 0 ... 199 bins
==========================
```

which generates 4 output files:

```
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ ls WHAMDAT_*
WHAMDAT_FED.out WHAMDAT_PDF.inp WHAMDAT_PDF.out WHAMDAT_WGT.dat
```

We are mostly interested in looking at **WHAMDAT_FED.out**:

```
[tutorial6-3/FED_HSR10_Q20_EDL2-US_1kToA2]$ gnuplot
gnuplot> plot [x=20:30] [y=-2:6] 'WHAMDAT_FED.out' u 1:2 w l lc 'red' t "US umbrellas 1-4, 2 mln steps"
```

Voila! All the noisy portions of FED fragmets disappeared and one nicely stitched and smooth FED profile has been produced. Obviously, there is room for improvement by the edges of the range, but that’s where we don’t have any additional data currently.

Once again, one has two options to get better accuracy by the edges:

- Run longer simulations in the two outmost windows;
- Add extra windows by the ends, say with \(R_0=21\) and \(29\) Angstroms.

Here is the FED profile one gets by increasing the simulation length in each window up to 5 million steps:

## Extra exercises¶

**Try to extend the range up to 40 Angstrom by adding extra windows beyond the current range****Try to run US/RE combination with parallel tempering, i.e. temperature variation enabled**(look in FED_HSR10_Q20_EDL2-US_1kToA2_Tx4)

Or move on to the next TUTORIAL 7 : Introduction to lattice-switch Monte Carlo

## References¶

- S.Kumar et al, J. Comp. Chem. (1992) 13, 1011
- J.Kastner, WIREs Comput. Mol. Sci. (2011) 1, 932